AITopics

doi: 10.1109/COMPSAC.2015.223

2511.11721

Genre: Research Report (0.82)

Industry: Information Technology > Services (0.94)

Technology:

Information Technology > Communications (1.00)
Information Technology > Cloud Computing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
(3 more...)

Hamadanian, Pouya, Karimi, Pantea, Nasr-Esfahany, Arash, Noorbakhsh, Kimia, Chandler, Joseph, ParandehGheibi, Ali, Alizadeh, Mohammad, Balakrishnan, Hari

Glia: A Human-Inspired AI for Automated Systems Design and Optimization

arXiv.org Artificial IntelligenceNov-18-2025

Can an AI autonomously design mechanisms for computer systems on par with the creativity and reasoning of human experts? We present Glia, an AI architecture for networked systems design that uses large language models (LLMs) in a human-inspired, multi-agent workflow. Each agent specializes in reasoning, experimentation, and analysis, collaborating through an evaluation framework that grounds abstract reasoning in empirical feedback. Unlike prior ML-for-systems methods that optimize black-box policies, Glia generates interpretable designs and exposes its reasoning process. When applied to a distributed GPU cluster for LLM inference, it produces new algorithms for request routing, scheduling, and auto-scaling that perform at human-expert levels in significantly less time, while yielding novel insights into workload behavior. Our results suggest that by combining reasoning LLMs with structured experimentation, an AI can produce creative and understandable designs for complex systems problems.

large language model, machine learning, natural language, (18 more...)

2510.27176

Country: North America > United States (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology (0.46)
Transportation (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Kossmann, Ferdi, Fontaine, Bruce, Khudia, Daya, Cafarella, Michael, Madden, Samuel

Is the GPU Half-Empty or Half-Full? Practical Scheduling Techniques for LLMs

arXiv.org Artificial IntelligenceOct-23-2024

Serving systems for Large Language Models (LLMs) improve throughput by processing several requests concurrently. However, multiplexing hardware resources between concurrent requests involves non-trivial scheduling decisions. Practical serving systems typically implement these decisions at two levels: First, a load balancer routes requests to different servers which each hold a replica of the LLM. Then, on each server, an engine-level scheduler decides when to run a request, or when to queue or preempt it. Improved scheduling policies may benefit a wide range of LLM deployments and can often be implemented as "drop-in replacements" to a system's current policy. In this work, we survey scheduling techniques from the literature and from practical serving systems. We find that schedulers from the literature often achieve good performance but introduce significant complexity. In contrast, schedulers in practical deployments often leave easy performance gains on the table but are easy to implement, deploy and configure. This finding motivates us to introduce two new scheduling techniques, which are both easy to implement, and outperform current techniques on production workload traces.

large language model, latency, machine learning, (19 more...)

2410.1784

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > San Diego County > Carlsbad (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
(2 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

#artificialintelligenceMar-14-2023, 06:25:07 GMT

Running a Stable Diffusion Cluster on GCP with tensorflow-serving (Part 2)

In part 1, we learned how to use terraform to set up and manage our infrastructure conveniently. In this part, we will continue on our journey to deploy a running Stable Diffusion model on the provisioned cluster. Note: You can follow this tutorial end-to-end even if you're a free user (as long as you have some of free tier credits left). Let's take a look at what the final result would be. If you add a bit of noise to an image gradually for many steps, you will end up with an image containing noise.

load balancer, pod, print statement, (14 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

#artificialintelligenceNov-30-2022, 22:32:03 GMT

Creating a Machine Learning App using FastAPI and Deploying it Using Kubernetes

FastAPI is a new Python-based web framework used to create Web APIs. FastAPI is fast when serving your application, also enhances the performance of our application. Note: for you to follow along easily, use Google Colab. It's an easy-to-use platform to get started quickly while building models. We will build a machine learning model that will predict the nationality of individuals using their names. This is a simple model that will explain the key concepts used in machine learning modeling. The dataset used will contains common names of people and their nationalities. Pandas is a software library written for the Python programming language for data manipulation and analysis.

application, dataset, prediction, (17 more...)

Industry: Education > Curriculum > Subject-Specific Education (0.85)

Technology:

Information Technology > Software > Programming Languages (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.75)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)

Yao, Zhiyuan, Ding, Zihan, Clausen, Thomas Heide

Reinforced Workload Distribution Fairness

arXiv.org Artificial IntelligenceOct-29-2021

Network load balancers are central components in data centers, that distributes workloads across multiple servers and thereby contribute to offering scalable services. However, when load balancers operate in dynamic environments with limited monitoring of application server loads, they rely on heuristic algorithms that require manual configurations for fairness and performance. To alleviate that, this paper proposes a distributed asynchronous reinforcement learning mechanism to - with no active load balancer state monitoring and limited network observations - improve the fairness of the workload distribution achieved by a load balancer. The performance of proposed mechanism is evaluated and compared with stateof-the-art load balancing algorithms in a simulator, under configurations with progressively increasing complexities. Preliminary results show promise in RLbased load balancing algorithms, and identify additional challenges and future research directions, including reward function design and model scalability.

server, traffic rate, workload, (14 more...)

2111.00008

Country:

Europe > France (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > New Jersey (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)

Genre: Research Report (0.70)

Industry: Information Technology > Services (0.35)

Technology:

Information Technology > Communications > Web (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.89)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.88)

Yao, Zhiyuan, Desmouceaux, Yoann, Townsley, Mark, Clausen, Thomas Heide

Towards Intelligent Load Balancing in Data Centers

arXiv.org Artificial IntelligenceOct-27-2021

Network load balancers are important components in data centers to provide scalable services. Workload distribution algorithms are based on heuristics, e.g., Equal-Cost Multi-Path (ECMP), Weighted-Cost Multi-Path (WCMP) or naive machine learning (ML) algorithms, e.g., ridge regression. Advanced ML-based approaches help achieve performance gain in different networking and system problems. However, it is challenging to apply ML algorithms on networking problems in real-life systems. It requires domain knowledge to collect features from low-latency, high-throughput, and scalable networking systems, which are dynamic and heterogenous. This paper proposes Aquarius to bridge the gap between ML and networking systems and demonstrates its usage in the context of network load balancers. This paper demonstrates its ability of conducting both offline data analysis and online model deployment in realistic systems. The results show that the ML model trained and deployed using Aquarius improves load balancing performance yet they also reveals more challenges to be resolved to apply ML for networking systems.

application, aquarius, load balancer, (16 more...)

2110.15788

Country:

Oceania > Australia > New South Wales > Sydney (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Genre: Research Report > New Finding (0.34)

Industry:

Information Technology > Services (0.85)
Energy > Power Industry (0.62)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.95)

#artificialintelligenceDec-16-2019, 22:43:20 GMT

How to run machine learning at scale -- without going broke

Machine learning is computationally expensive -- and because serving real-time predictions means running your ML models in the cloud, that computational expense translates into real dollars. Put another way, if you wanted to add a translation feature to your app that automatically translated text to your user's preferred language, you would deploy an NLP model as a web API for your app to consume. To host this API, you would need to deploy it through a cloud provider like AWS, put it behind a load balancer, and implement some kind of autoscaling functionality (probably involving Docker and Kubernetes). None of the above is free, and if you're dealing with a large amount of traffic, the total cost can get out of hand. This is especially true if you aren't optimizing your spend.

inference, inference workload, infrastructure, (12 more...)

Industry: Information Technology (0.37)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

#artificialintelligenceNov-29-2019, 10:53:54 GMT

End to End Machine Learning: From Data Collection to Deployment

This started out as a challenge. With a friend of mine, we wanted to see if it was possible to build something from scratch and push it to production. In this post, we'll go through the necessary steps to build and deploy a machine learning application. This starts from data collection to deployment and the journey, as you'll see it, is exciting and fun . Before we begin, let's have a look at the app we'll be building: As you see, this web app allows a user to evaluate random brands by writing reviews. While writing, the user will see the sentiment score of his input updating in real-time along with a proposed rating from 1 to 5. The user can then change the rating in case the suggested one does not reflect his views, and submit. You can think of this as a crowd sourcing app of brand reviews with a sentiment analysis model that suggests ratings that the user can tweak and adapt afterwards. To build this application we'll follow these steps: All the code is available in our github repository and organized in independant directories, so you can check it, run it and improve it. Disclaimer: The scripts below are meant for educational purposes only: scrape responsibly. In order to train a sentiment classifier, we need data. We can sure download open source datasets for sentiment analysis tasks such as Amazon Polarity or IMDB movie reviews but for the purpose of this tutorial, we'll build our own dataset.

app, balancer, load balancer, (15 more...)

Country: Europe > Denmark (0.04)

Genre:

Workflow (0.46)
Instructional Material (0.34)

Industry: Information Technology (0.34)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.89)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.89)

#artificialintelligenceSep-17-2017

A load balancer that learns, WebTorch – UnifyID – Medium

In my previous blog post "How I stopped worrying and embraced docker microservices" I talked about why Microservices are the bees knees for scaling Machine Learning in production. A fair amount of time has passed (almost a year ago, whoa) and it proved that building Deep Learning pipelines in production is a more complex, multi-aspect problem. Yes, microservices are an amazing tool, both for software reuse, distributed systems design, quick failure and recovery, yada yada. But what seems very obvious now, is that Machine Learning services are very stateful, and statefulness is a problem for horizontal scaling. An easy way to deal with this issue is understand that ML models are large, and thus should not be context switched.

artificial intelligence, machine learning, microservice, (10 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)